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ABSTRACT 


Proper  location  M-estimates  for  a  model  with  non-Gaussian 
autoregressive-moving  average  type  errors  are  genuine  maximum 
likelihood  type  estimates,  whereas  ordinary  location  M-estimates 
are  those  introduced  by  P.  Huber  for  independent  and  identically 
distributed  errors.  The  relative  behavior  of  ordinary  location  M- 
estimates  and  proper  location  M-estimates  is  studied  for  situations 
with  dependent  errors  of  purely  autoregressive  and  purely  moving 
average  type,  it  is  shown  through  asymptotic  calculations  and 
finite-sample  size  Monte  Carlo  studies  that  although  ordinary  loca¬ 
tion  M-estimates  are  adequate  for  weak  dependency  structure, 
they  can  be  quite  inefficient  compared  with  proper  M-estimates  of 
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1.  INTRODUCTION 

By  now,  P.  Huber's  (1964)  M-estimates  of  location  are  well  known.  These 
estimates  were  introduced  in  the  context  of  obtaining  robust  estimates  of  loca¬ 
tion  p.  for  independent  and  identically  distributed  observations  Ki,Ka  .  .  .  ,  Yn. 
For  reasons  which  become  clear  in  the  next  section  we  refer  to  Huber's  esti¬ 
mates  as  ordinary  location  M-estimates,  and  label  them  pan.  An  ordinary  loca¬ 
tion  M-estimate  is  obtained  by  solving 

-^U°  (...) 

C  ■  Sy 

with  a  good  algorithm,  where  sy  is  a  consistent  robust  estimate  of  the  scale  sy  of 
the  Yt,  c  is  a  tuning  constant  and  is  a  robustifying  psi-function.  With  V'  =  p', 
this  estimating  equation  characterizes  a  stationary  point  of  the  minimization 
problem 

min,/  f]p 

l 

Bounded  and  continuous  psi-functions  result  in  qualitative  robustness  for 
ordinary  location  M-estimates  at  certain  distributions,  including  the  normal  dis¬ 
tribution.  This  is  true  not  only  when  the  Yt  are  independent  and  identically  dis¬ 
tributed  (Hampel,  1971),  but  also  when  the  Yt  are  dependent  (Papantoni- 


Kazakos  and  Gray,  1979;  Cox,  1981;  Boente,  FraLman  and  Yohai,  1982). 

The  asymptotic  and  finite-sample  size  efficiency  robustness  of  ordinary 
location  M-estimates  have  been  extensively  studied  under  the  independent  and 
identically  distributed  observations  setup.  The  issue  of  efficiency  robustness 
where  the  distribution  for  the  data  is  both  dependent  and  possibly  has  a  heavy¬ 
tailed  non-Gaussian  has  received  relatively  little  attention.  Notable  exceptions 
include  the  theoretical  work  of  Portnoy  (1977),  and  the  Monte  Carlo  study  of 
Wegman  and  Carrol  (1977). 

The  essence  of  Portnoy’s  results  are  that  for  moving-average  type  non- 
Gaussian  errors  with  weak  correlation  structure,  ordinary  location  M-estimates 
do  well  in  terms  of  efficiency  relative  to  the  asymptotic  Cramer-Rao  lower 
bound.  In  addition,  through  use  of  a  small  correlation  expansion,  Portnoy  was 
able  to  obtain  approximate  asymptotic  min-max  results  which  involved  a  redes- 
vending  psi-function. 

Portnoy's  work  left  unanswered  the  question  of  how  ordinary  location  M- 
estimates  would  fare  with  moderate  to  large  correlation  structures  and  a 
heavy-tailed  distribution.  This  paper  partially  answers  the  question  through 
efficiency  comparisons  at  perfectly-observed  non-Gaussian  first-order  autore¬ 
gressive  and  moving-average  models.  Efficiencies  are  obtained  by  some  exact 
asymptotic  variance  calculations,  and  by  Monte  Carlo.  The  results  show  that 
ordinary  location  M-estimates  can  be  seriously  lacking  of  efficiency  robustness 
in  such  situations.  On  the  other  hand,  as  expected,  proper  M-estimates  have 
high  efficiency  robustness. 

The  next  section  briefly  introduces  proper  M-estimates,  while  Section  3 
gives  the  asymptotic  variance  expressions  for  both  ordinary  and  proper  M- 
estimates.  These  expressions  reveal  almost  immediately  some  substantially 
negative  aspects  of  ordinary  location  M-estimates  in  dependent  process 


2.  PROPER  M-ESTIMATES  OF  LOCATION 

Suppose  that  p.  is  a  location  parameter  and  that  the  observations  are 

Yt  =H+Vt.  t  =  1.2, . n  (2.1) 

where  Vt  is  an  ARMA(p.q)  model 

Vt+¥\Vt-\+  ‘  '  '  +<PpVt-p  =  £i  +  ®iE<_i+  •  •  ■  +0?et_,  (2.2) 

with  the  ct  being  independent  and  having  a  common  symmetric  distribution 
C(e)  =  C0(e/se),  st  being  a  scale  parameter  for  the  innovations.  The  et  are 
often  called  the  innovations  process.  This  yields  the  equivalent  ARMA(p,q) 
model 

Yt+<P\Yt-\+  '  '  '  +<PpYt-p  =  7+Ei  +  0iE(_t+  ■  •  •  +0,Et_,  (2.3) 

where  the  intercept  is 

7  =  M(1+S^<)  ■  (2-4) 

Let  a'=(7',£',j0')  represent  arbitrary  parameter  values  in  the  region  of  sta- 
tionarity  and  invertiblity  for  the  ARMA  process,  and  let  (7 <£.'&)  represent  the 
true  parameter  values.  Denote  by  rt(a  )  the  residuals  computed  from  an 
observed  sample  Kj,  ■  ••  ,Tn  by  one  of  the  usual  variants  with  regard  to  initial 
conditions  (see  for  example,  Box  and  Jenkins,  1976).  An  M-estimate  of  a  is  a 
solution  of  the  minimization  problem 

"  rt  (a’)  .  . 

nuiW  S  P  — ~ "  (2.5) 

t=i  c  •  st 

where  p  is  a  robustifying  loss  function.  The  constant  c  is  a  tuning  constant  and 
st  is  a  robust  estimate  of  the  innovations  scale  st. 

Now  given  an  M-estimate  S  of  a  =  (y,£,0),  the  relation  (2.4)  leads  to  the 


proper  M-estimate  of  location 


Consistency  and  asymptotic  normality  of  5  and  p  have  been  established  by  Lee 
and  Martin  (1982a). 


In  the  special  case  where  p(t )  =  -  log  g0  ( t ),  with  g0  the  density  for  G0 ,  5  and 


p  are  conditional  maximum-likelihood  estimates  of  a  and  /a  where  the  condi¬ 
tioning  involves  fixing  not  only  Kj,  •  Yp ,  but  also  estimates  cj,  ■  •  •  ,e9  of 
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3.  ASYMPTOTIC  CONSIDERATIONS 

First  consider  an  ordinary  location  M-estimate  pay  computed  from  observa¬ 
tions  Y i,  .  .  .  ,  Yn  in  (2.1)  which  have  a  common  marginal  distribution  F(y)  = 
F0((y -fY)/sy).  Under  regularity  conditions  (see  for  example  Portnoy.  1977)  poy 
is  consistent  and  asymptotically  normal,  with  asymptotic  variance  given  by 


Van 


C(0)+2§C(O 

i=i 


E?.Vc(Y ,) 


(3.1) 


where 


C(l)  =  s^EFwMYMYih).  I  =  0,1.2.  ■  •  (3.2) 

Here  for  i  =  0,  F M  is  the  standardized  marginal  distribution  F0  of  the  Yt,  while 
for  l  2  1  F oi  is  the  bivariate  distribution  for  (Yy,YL+L)  obtained  when  p=0  and 
sv  =  l.  The  tuning  constant  c  appearing  in  (1.1)  is  now  (and  henceforth) 
absorbed  in  the  definition  of  if/c.  In  the  special  case  of  independent  Yt.  Fq  -  C0 
and  V0y  reduces  to 

EF  f|(K,) 

You  =  Sy  {y-  =  SyViac  (t/^c  ,F0 )  (3.3) 

where  F£oc  =  VLoc  (il/,F0 ),  defined  by  the  right-hand  equality  above,  is  P.  Huber’s 
(1964)  well-known  expression  for  the  asymptotic  variance  of  ordinary  location 
M-estimates. 

Now  for  the  case  of  a  proper  location  M-estimate  p,  it  can  be  shown  (Lee 
and  Martin,  1982a),  that  the  asymptotic  variance  expression  is 

v  =  PtoefocC.)  •  (3.4) 

The  quantity  s£(l+£0f)a/(l+E<Pi)8  differs  by  only  a  constant  factor  from  the 
value  at  zero  frequency  of  the  spectrum  of  the  process  Yt.  When  ^  is  the 
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identity  function  so  that  p^y  =  pis  —  Y  >  and  st  is  the  standard  deviation,  (3.4) 
yields  the  well-known  result  that  the  asymptotic  variance  of  the  sample  mean  is 
given  by  the  spectrum  of  the  process  evaluated  at  zero  frequency  (Grenander, 
1954,  1981). 

The  simplicity  of  the  expression  for  V  relative  to  that  of  V0y  is  quite  attrac¬ 
tive,  particularly  with  regard  to  the  relative  ease  of  studentizing  the  estimate  p 
for  the  purpose  of  constructing  confidence  intervals.  Estimation  of  V  from  the 
data  for  this  purpose  may  be  quite  manageable,  whereas  estimation  of  V0y 
seems  rather  impractical  when  many  C(l)  are  non-zero.  In  this  regard  the 
situation  is  particularly  bad  when  an  autoregression  component  is  present,  since 
then  the  C  (l)  only  vanish  asymptotically. 

Furthermore,  the  effect  of  the  tuning  constant  c  on  the  asymptotic 
efficiency  of  p  shows  up  only  in  the  Vloc  factor  of  the  expression  for  V.  Since  Vloc 
is  not  affected  by  the  dependency  structure  for  Yt,  as  specified  by  the  parame¬ 
ters  tpi  and  0it  efficiencies  can  be  controlled  through  c  without  regard  to  the 
values  of  these  parameters.  This  is  not  the  case  with  regard  to  V0y,  as  can  be 
seen  in  the  following  equivalent  form  of  (3.1): 

v0y=  l+2f>u+i  syz  Vl0c(fc,F0)  (3.5) 

i=i 

where  Pu+i  is  the  correlation  coefficient  for  the  random  variables  V'c(l'i)  and 
fc(Yui)  when  {Y \,Y i+i)~Foi-  Here  the  effects  of  c  appear  not  only  in  VLoc,  but 
also  in  the  correlation  coefficients  Pu+i.  and  the  latter  depend  on  the  ARMA 
model  parameters  <pt  and  This  makes  the  adjustment  of  c  to  obtain  desired 
Gaussian  process  efficiencies  quite  onerous,  if  not  impractical. 

In  lieu  of  a  better  scheme,  one  would  probably  choose  c  for  poy  such  that  a 
desired  efficiency  is  obtained  for  independent  and  identically  distributed  Gaus¬ 
sian  data.  It  should  be  noted  that  such  a  value  of  c  yields  the  same  efficiency 


for  p  at  any  Gaussian  ARMA  process  (see  first  paragraph  of  Section  4  in  this 
regard). 

In  order  to  gain  some  insight  into  why  p  might  be  significantly  more 
efficient  than  pay  at  highly  correlated  non-Gaussian  ARMA  situations,  consider 
the  case  where  Yt  is  a  first-order  autoregression  with  parameter  In  this  case 
V  may  be  expressed  in  the  following  form,  which  facilitates  comparison  with 
(3.5): 

V=  -j~2V“c^c.Co)  ■  (3.6) 

It  is  easy  to  check  that  the  factors  in  square  brackets  in  (3.5)  and  (3.6)  are 
identical  when  ip  is  the  identity  function.  We  conjecture  that  these  factors  do 
not  differ  by  too  much  for  either  Gaussian  or  non-Gaussian  processes  Yt  when  ip 
is  one  of  the  popular  psi-functions.  Assuming  that  this  is  the  case,  the  behavior 
of  Vqm  relative  to  V  will  be  determined  by  the  relative  values  of  VintoM. 
Koc  (i>c  >G0 ).  Sy  and  s}/(l-y2). 

Suppose  that  the  same  value  of  tuning  constant  c  is  used  for  both  the  ordi¬ 
nary  and  proper  location  M-estimates  (in  view  of  our  previous  comments,  this  is 
not  an  unlikely  scenario).  Then  we  can  expect  that  in  many  non-Gaussian  situa¬ 
tions  Vloc  (V'c  ,F0 )  will  be  larger  than  Vi0C{tpc,G0)  when  <p*  0.  This  is  because  }'*  is  a 
weighted  sum  of  the  et,  and  the  convolutions  which  produce  F0  from  non- 
Gaussian  G0  will  often  result  in  an  F0  having  heavier  tails  than  C0.  At  the  same 
time  Sy  and  s?/(l-tp2)  will  be  identical  in  finite-variance  non-Gaussian  situa¬ 
tions,  and  then  we  may  expect  that  Vqu  is  larger  than  V. 

Of  course  for  stable  C„  we  will  have  F0  =  C0,  and  then  the  two  Ktoc’s  will  be 
identical.  However,  in  such  a  case  Sy  and  Sj/(1— p2)  will  no  longer  be  identical 
(except  in  The  Gaussian  case).  For  example,  when  C0  is  a  symmetric  stable  dis¬ 
tribution  with  index  rj,  Fg  is  also  a  symmetric  stable  distribution,  and  it  is  easy 
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to  check  that  (see  Feller,  1966) 

c2  1  2 

R  =  Sv  _  1~V  /q  7\ 

si/dV)  a-M”)2^  '  {  ) 

The  Cauchy  distribution  is  obtained  when  77=1,  and  in  this  case  we  have  R=  3  and 
19  when  0.5  and  0.9,  respectively.  If  we  assume  that  the  expressions  (3.5) 
and  (3.6)  hold  for  infinite-variance  situations,  and  that  the  square-bracketed  fac¬ 
tors  in  (3.5)  and  (3.6)  are  not  too  different,  then  may  be  much  larger  than 
V. 

In  the  concluding  comments  section  of  the  paper,  a  more  direct  heuristic 
argument  is  also  offered  in  explanation  of  the  relative  inefficiency  of  Jlcy. 
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4.  EXACT  ASYMPTOTIC  RELATIVE  EFFICIENCY  RESULTS 

The  asymptotic  absolute  efficiencies  of  a  proper  M-estimate  at  various  dis¬ 
tributions  are  the  same  as  those  of  an  ordinary  location  M-estimate  based  on 
matching  ipc .  with  independent  observations.  This  follows  from  the  fact  that  the 
asymptotic  lower  bound  on  variance  is  given  by  (3.4)  with  V ^  replaced  by  the 
reciprocal  of  the  Fisher  information  i(g0)  =  f  ( g'0/9o)Z9o  for  the  standardized 
innovations  density  g0  (Martin,  1982). 

Since  the  literature  abounds  with  asymptotic  efficiency  computations  for 
ordinary  location  M-estimates  based  on  various  Y'c  and  independent  Yt,  our  main 
interest  is  in  comparing  with  p  for  the  model  (2.1)  -  (2.2).  Thus  we  wish  to 
compute  the  asymptotic  relative  efficiencies 

AREFF  =  AREFFfc  ,C0a )  =  (*-D 

for  various  i)/c  ,Ga  and  a. 

This  task  is  made  difficult  mainly  because  of  the  relatively  complex  struc¬ 
ture  of  Fq*  .  For  example,  to  compute  (3. 1)  in  the  case  of  first-order  autoregres- 
sions.  both  the  stationary  distribution  F0,  and  the  bivariate  distributions  Foi. 
I  =  1, 2.  ,  are  required.  Unfortunately,  we  can  seldom  specify  F0  and  Foi. 

L  =  1,2,  ,  in  closed  form  when  G0  is  non-Gaussian  (symmetric  stable  G0  is  the 

main  exception).  Thus  we  study  the  case  of  a  first-order  autoregression  solely 
via  Monte  Carlo  in  the  next  section. 

On  the  other  hand  for  moving -average  processes  of  order  g,  the  summation 
in  (3.1)  contains  only  a  finite  number  of  non-zero  terms,  and  for  small  g  we  can 
sometimes  find  closed  form  expressions  for  the  C(l),  I  =  0,1,  .  ...  g,  and £’/0ll''c • 

We  treat  here  the  UA(l)  case  with  parameter  0,  where  (i)  cj  has  a  contam¬ 
inated  normal  distribution. 


CNiy.o*)  =  (1-6)N(0, 1)  +  ^(O.cr2) 


(4.2) 


and  (ii)  ip  has  either  the  normal  distribution  shape 

Mt)  =  v'srwo-ja  (4.3) 

or  the  shape  of  the  derivative  of  the  normal  density, 

V>«)(0  ~  1  '  (4.4) 

For  either  of  the  combinations  (4.2)  -  (4.3)  or  (4.2)  -  (4.4),  a  closed  form  expres¬ 
sion  for  V0y  (and  also  for  V)  is  obtained  in  a  straightforward  but  tedious 
manner.  These  rather  ugly  expressions  are  developed  in  the  Appendix. 

It  should  be  ki  pt  in  mind  that  ip+  and  are  used  here  only  because:  (i) 
they  facilitate  an  exact  calculation,  and  (ii)  at  the  same  time  yield  comparable 
efficiency  robustness  to  that  obtainable  with  Huber’s  (1964)  favorite  psi-function 
ipf{(t )  =  max(-  l,min(  l,t )),  and  Tukey’s  bisquare  psi-function  (see  Mosteller  and 
Tukey,  1977),  respectively.  Point  (ii)  was  verified  through  Monte  Carlo  results 
not  reported  here. 

Except  for  the  second  set  of  results  in  this  section,  the  tuning  constants 
cojf  and  c  for  the  ordinary  and  proper  M-estimates  are  adjusted  so  that  for  both 
ipSD  and  ip*<  Pou  and  p  have  matched  asymptotic  efficiencies  of  .90  for  indepen¬ 
dent  Gaussian  observations  (0=0). 

Figure  1  shows  AREFF s  based  on  ipso  for  various  0  values,  where 
ct  ~  CNid.a2)  with  6  =  0.1,  ISctSIO.  Although  the  AREFF’s  can  be  quite  low  for 
negative  Q,  they  are  quite  high  for  a  wide  range  of  positive  0. 

In  Figure  2  we  display  AREFF’s  based  on  ipso  for  the  same  values  of  y.a2  and 
0,  except  that  c ay  has  been  adjusted  to  obtain  matching  asymptotic  efficiencies 
of  .90  for  each  value  of  0  and  Gaussian  ej.  The  values  of  tuning  constants 
coy  =coy(0)  needed  to  achieve  various  efficiencies  are  given  in  Table  1  for  ip+, 
and  in  Table  2  for  ipso ■  While  marked  improvement  in  the  relative  performance 
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of  pen  is  achieved  at  8  =  -.5  and  -.9  at  small  values  of  a,  the  improvement  at 
large  values  of  cr  is  negligible.  Thus  even  "-proper''  adjustment  of  c  using  typi¬ 
cally  unavailable  prior  information  on  8  will  not  salvage  Pqm  for  MA(l)  models 
with  negative  8. 

Figures  la  and  2a  give  corresponding  AREFF’s  based  on  Although  V'w/) 
has  the  edge  over  if/*  at  some  8  values,  the  results  are  not  overall  too  different 
from  those  in  Figures  1  and  2. 

TABLE  1 

Tuning  constants  Com  ~  com(8) 
which  yield,  various  efficiencies  for  Y'* 


0\EFF 

0.95 

0.90 

0.85 

0.80 

-0.9 

4.450 

3.651 

3.222 

2.927 

-0.7 

2.343 

1.870 

1.611 

1.431 

-0.5 

1.669 

1.287 

1.074 

0.923 

-0.3 

1.299 

0.959 

0.765 

0.625 

-0.1 

1.049 

0.731 

0.546 

0.409 

0.0 

0.952 

0.642 

0.460 

0.324 

0.1 

0.876 

0.571 

0.390 

0.255 

0.3 

0.781 

0.480 

0.298 

0.161 

0.5 

0.747 

0.443 

0.257 

0.115 

0.7 

0.741 

0.433 

0.243 

0.097 

0.9 

0.741 

0.431 

0.239 

0.092 

■>*v 


5.  MONTE  CARLO  RELATIVE  EFFICIENCIES 

In  order  to  check  both  the  finite-sample  size  relative  efficiencies  (REFFs) 
of  /lot/  and  P  for  both  MA{1)  models  as  used  for  Figure  1,  and  AR(l)  models,  some 
Monte  Carlo  computations  were  carried  out  using  500  replications  at  sample  size 
100.  Tuning  constants  cqm  were  adjusted  for  asymptotic  efficiencies  of  0.9  at 
independent  Gaussian  Yt,  as  described  in  the  previous  section. 

The  ordinary  location  M-estimates  were  computed  using  the  median  as  a 
starting  point,  followed  by  4  iterations  of  iterated-weighted  least-squares  using 
V'*,  followed  by  one  iteration  using  V'va-  The  proper  M-estimates  were  computed 
using  10  iterations  of  a  nonlinear  optimization  algorithm  for  solving  (2.5),  which 
is  described  in  Lee  and  Martin  (1982b),  followed  by  computing  p.  from  (2.6). 

The  results  for  the  MA(l)  case  using  ipND  are  shown  in  Figure  3.  The  REFFs 
are  in  quite  good  agreement  with  the  asymptotic  REFF's  of  Figure  1,  except  for 
cr=  1  (the  Gaussian  case). 

Results  for  the  AR(l)  case  using  ipttD  are  given  in  Figure  4.  Here  REFFs  can 
be  quite  low  for  positive  <p  as  well  as  negative,  the  former  case  being  the  more 
commonly  encountered  one  in  practice.  Furthermore,  p  =  ±0.5  can  already 
result  in  REFF's  as  low  as  70%  for  large  a,  and  for  larger  |  <f>\  the  relative  loss  in 
efficiency  associated  with  fiau  ma y  become  quite  intolerable.  Also,  the  REFFs 
are  roughly  symmetric  in  <p,  which  contrasts  sharply  with  the  MA(l)  results  of 
Figure  3. 

Figures  3a  and  4a  give  corresponding  results  using  V'*-  Again,  -fsD  tends  to 
dominate  somewhat,  but  the  differences  are  not  overwhelming. 

As  a  check  on  the  “absolute”  efficiencies  of  £  at  MA(1)  and  AR(1)  models, 
we  provide  Figures  5  and  6  for  and  Figures  5a  and  5b  for  By  "absolute” 
efficiencies  (EFF’s)  we  mean  the  asymptotic  Cramer-Rao  lower  bound  divided  by 
the  Monte  Carlo  variance.  Except  for  the  case  6  =  -0.9  which  requires  large 


sample  sizes  to  achieve  high  absolute  efficiencies,  £  is  very  efficient  for  almost 
all  other  cases  at  a  sample  size  of  100.  With  regard  to  the  case  6  =  -.9,  one 
should  keep  in  mind  that  6-- 1  is  a  distinguished  point  of  superefficiency  (see 
for  example.  Chapter  4.4  of  Grenander,  1981). 


6.  CONCLUDING  COMMENTS 


The  following  simple  heuristic  argument  indicates  why  p  should  generally  be 
more  precise  than  pay,  particularly  in  the  case  of  autoregressions  with 
moderate  to  large  correlation.  Suppose  one  is  using  pay  with  robust  scale  esti¬ 
mate  sv,  and  that  the  series  contains  just  one  huge  isolated  outlier  in  the  e<  at 
time  to  say,  after  which  the  sample  path  will  decay  roughly  like  the  homogenous 
solution  to  (2.2).  The  first  part  of  this  decay  will  produce  residuals 
rt  =  Yt-p,  t^t0,  which  exceed  sy  in  magnitude  and  will  thus  be  down-weighted. 
Unfortunately,  it  is  only  the  initial  residual  rt  that  deserves  downweighting,  and 
this  results  in  loss  of  information.  Because  the  residuals  in  (2.5)  are  based  on 
the  regression  with  intercept  form  (2.3),  only  the  residual  at  time  t0  will  be 
heavily  downweighted,  and  information  in  the  immediately  succeeding  observa¬ 
tions  will  be  utilized. 

This  argument  can  also  be  cast  in  terms  of  the  scatter  plot  of  Yt  versus 
Yt- 1,  say  for  an  AR(l)  process,  in  the  spirit  of  Cox’s  (1966)  comments  with 
regard  to  the  null  distribution  of  the  serial  correlation  coefficient.  The  pair 
(Ti0_i,K<0)  will  be  far  removed  from  the  regression  line  with  slope  y>  and  inter¬ 
cept  y,  but  the  pairs  (Fj-i.Kj),  f=/o+l,  .  constitute  good  leverage  points 

(i.e.,  points  which  will  lie  close  to  the  regression  line  and  are  large  in  magnitude) 
for  estimating  y  and  ip  —  the  latter  with  ultra  precision  when  ct  has  a  heavy¬ 
tailed  distribution  (Martin,  1982).  The  ordinary  location  M-estimate  would  down¬ 
weight  such  points. 

The  asymptotic  and  finite-sample  efficiencies  of  pay  relative  to  py,  along 
with  awkwardness  and  impracticality  of  assessing  the  variability  of  pay,  suggest 
that  it  should  be  used  only  when  one  is  certain  that  the  correlation  structure  of 
the  errors  is  quite  weak.  For  situations  where  the  non-Gaussian  ARMA  model 
(2. l)-(2.2)  is  a  good  approximation  to  reality,  the  proper  M-estimate  p  is 
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preferred. 

When  (2.  l)-(2.2)  does  not  provide  a  good  model  for  non-Gaussian  time  series 
with  outliers,  e.g.,  when  Yt  is  corrupted  with  additive  outliers,  then  the  proper 
M-estimate  p  will  no  longer  be  advisable  since  it  is  not  robust  toward  such  devia¬ 
tions  from  a  nominal  Gaussian  ARMA  model  (see  Martin  and  Yohai,  1984).  More 
generally,  p  is  not  robust  over  a  full  neighborhood  of  the  nominal  Gaussian 
model.  An  alternative  proposal  for  estimating  i±  is  mentioned  in  Section  VIII  of 
Martin  (1981).  A  detailed  study  of  this  alternative,  among  others,  is  called  for. 
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Appendix 


ASYMPTOTIC  VARIANCE  EXPRESSIONS 


As  was  mentioned  in  Section  4.  one  can  obtain  closed  form  expressions  for 
Vox  in  (3.1)  as  well  as  V  in  (3.4)  for  the  special  case  where  et  ~  CN(6,az)  and 
either  V'i  =V'*  or  (see  equations  (4.2)-(4.4)).  The  keys  to  this  are  the  fol¬ 

lowing  relationships: 


f$(ax+@)— ^ — e  2 


v2tT  ' 


dx  =  $ 


'J  1+a2 


(A- 1) 


«g_ 


+a2+/?2 


(A.2) 


A 1  was  given  by  Gupta,  S.S.  and  Pillai,  K.C.S.  (1965),  and  a  proof  of  A.2  may  be 
found  in  Jong  (1977,  Lemma  16). 


The  Cumulative  Normal  Psi-Function 


Since  rt  ~  C  =  CN  (6, o2)  =  (1- 6)N  (0,1)  +  6N  (O.o2),  the  MA(l)  process  |Ftj 
has  the  four-component  normal  mixture  distribution  F  =  NM(d,6,oz) 
=  (l-6)zN(0,(l+92))  +  i5(l-<5)N(O,c2+02)  +  c5(l-<5)N(O,l+02a2)  + 
^^(O^l+^Jo2).  Let^,.(e)  denote  scaled  for  the  error  process  Et: 


fe(c)  =  c  •  se  Vcn 


£ 

=  CSeV2TT 

1>(-£-)  -  ^ 

cse 

c  st  2 

(A.3) 


where  c  is  the  tuning  constant.  This  we  use  in  computing  V .  Similarly,  in  com¬ 
puting  Vqu  we  use 


fc0M(y)  =  c0itsyV2n 


4> 

y 

1 

CQUSy 

2 

(A.  4) 


which  is  in  scaled  for  the  Yt  process,  with  tuning  constant  cou- 

First  we  get  the  expression  for  V  with  k  =  cst  and  G  =  CA^(<5,o2),  A.2  and  A.3: 

**■>  ■  + 5  un~'  ££/v  I  (A-5) 


Eci'de)  =  ( 1 — <5 )  n/Ac  a/'(  1 +fca)  +  dvfcVt^+k2) 


K  =  (l+0)2£,c^(e)/5M(c) 

(l+g)2fc2[(l-6)tan~1  (1/fc2)  ,  +  <5  tan"1  - 

Vl+2/A:2  Vl+2  a2/*:2 


|(1--(5)vA:2/'(1+A:2)  +  6\kz/{<r+kz)  j 

As  for  Foif,  we  need  to  evaluate  C(0)  and  C(l)  using  V'c^-  With  =  cOJ/sy 
and/1  =  NMidrf.o2),  A.2  and  A.4  give 

W^Yi)  =  (l-«)Vfcy2/(l+fl8+*V2)  +  iCl-^AMl+^+fc,2) 

+  6(l-6)y/ky/(0e+oz+ky ) 

+  aVAvMU+eV+fcy2]  .  (a.  a) 

C(0)  =  EFi>}QM(Yx)  =  k*  (l-^tan-^l+fl2)  •  ^Wl+2^ 

+  (5(l-i)tan~1|(l+02+CT2)  •  A^-z/'Vl+2(l  +  02+o2)A^'2 
+  <5(l-d)tan_1J(l+flzo2)  •fc1|'Wl+2(l+9V)tv‘2 

+  62tan-1|(l+02)o2A^'2/v  1+2(1  + S2)^^2  (A.9) 

Now  for  C(l),  first  note  that 

^c0M(Yt)fc0M(Yt*i)  =  2ttA:v2  [*(A:w-|Ei  +  »A:v_1e,_,)<l>(A:y"1Et  +  i+^",ct) 
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-  ^(kyxtt+9k^ lEt-1)--|$(fcy  'ct+x+dky  1£t)  +  -j 


(A.  10) 


Since  the  e(  are  i.i.d.  with  distribution.  Ctfid.o2),  we  can  condition  on  et  and 
apply  (A.  1)  to  get 

E[$(kJlEt  +  0ky~ht.i)\Et] 

=  (l-«5)$|fcv-1c</'\/  l  +  d2kyZ  +  5<b\^'tt/\/\+Gz(Pky2  (A.  11) 


Similarly 


E[$(ky  'et  +  x  +  dkyXEt)\Et] 

-  (l-(5)$|0fc1j'I£j/\/ 1 +kyZ  J  +  <5$| dk^^Et/yJ  1 +ozk^2  j  (A.  12) 

Taking  expectation  with  respect  to  et  in  (All)  and  (A  12).  and  using  A1  with 
0=0,  gives 

EG*(kylEt+eky'Et- ,)  =  *(0)  =  |  (A.  13) 

£'c$(fcs71Et+i+(?fcy_1Ei)  =  $>(0)  =  ^  (A.  14) 


For  the  expectation  of  the  first  term  on  the  right-hand  side  of  (A.  10),  we 
again  use  the  results  in  (A.  11)  and  (A.  12) 

E  [  $>  ( 1  e  t  +  9  k^ 1  e  t  _ , )  $  ( fcy" 1  e  t  + 1  +  9  k  y  1  £  t )  ] 

=  (l-6)2£’c<!>|fcv'',Ei./\/ l+^fcy-2  j$|0fcv''1Ej/V  1+fcjJ"2 

+  <5(l-d)£’c$Jfcv-1£i/'\/l  +  0zfcv”2  J$j#fc^'tEj/y  1  +  o2kyZ  j 

+  S(l-6)Ec*\kt'ct/'J l+^^fcy"2  1 +fcv"2  J 

+  d2£’c$|fcv_,£t//\/ l  +  ^Vfcy-2  j$|0fcv-lej/-\/l+a2fcv“8 


=  (l-(5)2A !  +  <5(1-<5)[A2+A3]  +  6zA4 


(A.  15) 
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The  expectations /4 1-A4  in  (A.  15)  can  be  obtained  by  applying  (A.2)  with  con¬ 
stants  appropriately  adjusted: 


**»-*  [*»  < 1+  e%~  2)(  l+fcv"2)+(  1+fcv_2)+ 02(  1+  A2  V2)] 


+  — tan- 


0^V'lfcv( 1+  02  V2)(l+fcv  2)+o2(  l+fcv“2)+  08^( 1+  02fcV  a)]" 


1  *  A  4.  <5  1 

~  A  ^  An  +  —  A, 2 


4  2n 


2n 


(A.  16) 


4  2n 


0Ar-‘[^(l+fl%-2)(l+o2V2)+(l  +  ^V2)  +  »2(l+^va)  ' 


_  1  (1-5)  .5 

-i  +  ^r/4ai+  - 


2tt 


‘22 


(A.  17) 


1 

^3  =  4  +  ^2^tan~'  9ky~1l^(1+^ff%'ZXl+^~Z)+(l+k-z)  +  0^1+a2ff\-z)J~1 


+  2^tan~' 


0c2' V1  fcv  ( l+a2 02fcv"2)(  1+fcv  2)+o8(  1+V 2)+o2 02(  1 +a202fc -2)j 


i  .  ( 1 — <5)  ,  .  6  . 

4  2^31  +  2^32 


(A.  18) 


and  finally, 


=  i  + 


flfcy1 


l*v  (  1 + o2  e2*-2)  ( 1 + -2) + ( 1 + o2^-2)  +  e2(  1 + a2 ezkyz)\  2 
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+  7r-tan-1 
2n 


k^(l+azdzkyz)(l+a2k^’z)+<jz(l+azky2)  +  a29z(l+azdztje) 


±+i±Z*lA„  +  J_A 


2rr  "41  2rr  42 


(A.  19) 


Now  applying  (A.  15)  -  (A.  19),  we  have 

C(l)  =  EAl'c0M(rt)1'cotl(Yt+i)}  =  k*\(l-6)*An  +  <5(1-<5)%412+A21+A31) 


+  (52(l  —  ^)(i422+-^32^4l)  + 


(A.  20) 


Therefore,  (A.8),  (A.9)  and  (A.20)  can  be  combined  to  get  the  closed  form  for 


Vau- 


Von  - 


C  (0)+2C(l) 

EH'c0U{Yx) 


(A.21) 


Normal  Derivative  Psi-Function 

Let  Tpc  denote  Vjvfl  scaled  for  et,  with  tuning  constant  c : 

-ipc(e)  =  csefND(e/csc)  =  c  exp[-cz/2czs?]  (A.22) 

Similarly,  let  Tpcoti  denote  ipso  scaled  for  Yt,  with  tuning  constant  com- 

fcouiy)  =  V  exp[-yz/2c$MSz]  (A.23) 

First  we  obtain  the  expression  for  V,  with  k  =  cst  and  G  =  CNid.a2).  Direct 
evaluation  gives 

3  3 

Ectf{t)  -  (l-<5)fcV(2+fcz)2  +  <5u2fc3/(2irz+A:z)2  (A.24) 

and 

=  (1-.5)— ^-3-  +  <5 - *!__ 

(1+fc2)2  (c^+fc2)2 


(A. 25) 


Now  V  =  Eci>c(^)/ESi/'c(e)  may  be  computed  from  A.24  and  A.25. 

Next  we  evaluate  EFip'(Y\),  C( 0)  and  C(l),  with  F  =  NM (8, 6, a2)  and 
ky  =  couSy  in  order  to  compute  Von-  First,  we  have 

3  3 

EFi>‘c0M(yr)  =  +  <5(1 

3 

+  d(l-<5)fc3/(l+02o2+fc12)2 

+  (A.  26) 

As  for  C(0): 

3 

C(0)  =  EFTpc0M(Y\)  =  (l-i)2(l+fl%3/[2(l  +02)+fcy2]2 

3 

+  <5(l-d)(cr2+«2)ifcv3/t2(a2+0Z)+fcv]Z 

3 

+  <5(  l-<5)(  1  +  0V)ifcv3/[2(  1  +  »V)+ty2] 2 

3 

+  (52(l+02)c2fcy3/[2o2(l+flz)+fcv2]2  (A.  27) 


As  for  C(l),  consider  first  the  expectation  conditioned  on  et  : 


Elf(yt+i)ip(yt)\zt]  =  ef 


[e* +1+ 9tt  ]eip  [-( Et +,+  0et  )2/2fcy2] 


•  £y 


[et  +  0Et_, 


]exp[-(Et  +  0Et_,)V2fcy2]  Et 


=  Kite,)  •  Kg(*t) 


(A.  28) 


Ki(ct)  =  —  et  •  exp[- 0z£tz/2(  1+fcy )] 


where 


+ 


-  exp[-*Ef/Z(6*+k*)] 

o*+J fc2)2 

and 

Kz{tt)  =  i1"^  •  exp[-E2/2(02+fcy )] 

C^+fey2)2 

+  - — — ^■tte-K^[-tf/2{9zaZji-k^)] 

(SV+fcy2)2 

Therefore, 

C{\)  =  Ec[Kx{et)Kz{et)] 

3 

=  0Jfcv8  (l-d)3/[fla(®a+fcv)+(1+fcv)+(1+fcv)(02+fcv)]2 


+  (l-<5)2<5/[02(flV+fc2)+(l+ifc2)4-(l+fc2)(0V+fcy2)]2 


+  (  1-(5)2M  02(  52+fcy  )+(o2  +  fcy2)  +  (a2  +  ^y  )(  02  +  fcy2)]  2 


+  (l-6)6z/[flz(52a2+fcy  )+(oz+ky)+(o2+ky)(02o2+ky  )]2 


+  (l-<5)zdo2/[02o2(0s+A^)+cr2(l+fc^)+(l+fc2)(0a+fcva)]2 


+  (l-^sV/Te^eV+fcy^+^Ci+fcy^+Ci+fc^CeV+fcy2)] 


+  ( l-d)<5  V/[  0  V(  &z+hz)+a2((jzkvz)+(aS!+hz)(ez+  k*)\ 2 

3 

+  (5V/[02a2(0zo2+fc2)+c72(o2+fc2)+(o2+fcv2)(02c72+fcy2)]2 


Now  (A.26),  (A.27)  and  (A.29)  can  be  combined  to  obtain  the  closed  form  for  Vqm  ■ 


References 


Boente,  G.,  Fraiman,  R.  and  Yohai,  V.  (1982).  “Qualitative  robustness  for  general 
stochastic  processes."  Tech  Rep.  26,  Dept,  of  Statistics,  Univ.  of  Washington, 
Seattle,  WA. 

Box,  G.E.P.  and  Jenkins,  G.M.  (1976).  Time  Series  Analysis:  Forecasting  and 
Control,  Holden-Day,  San  Francisco. 

Cox,  D.R.  (1966).  “The  null  distribution  of  the  first  serial  correlation 
coefficient,"  Biametrika  53,  623-626. 

Cox.  D.  (1981).  "Metrics  on  stochastic  processes  and  qualitative  robustness," 
Tech.  Rep.  3,  Dept,  of  Statistics,  Univ.  of  Washington,  Seattle,  WA. 

Feller,  W.  (1966).  An  Introduction  to  Probability  Theory  and  its  Applications, 
Volume  II,  John  Wiley,  New  York. 

Grenander,  U.  (1954).  "On  the  estimation  of  regression  coefficients  in  the  case 
of  an  autocorrelated  disturbance,”  Annals  Math.  Stat.  25,  252-272. 

Grenander,  U.  (1981).  Abstract  Inference,  John  Wiley,  New  York. 

Gupta,  S.S.  and  Pillai,  K.C.S.  (1965).  "On  linear  functions  of  ordered  correlated 
normal  random  variables,”  Biometrika  52,  367-379. 


-27- 


Hampel,  F.R.  (1971).  "A  general  qualitative  definition  of  robustness,"  Ann.  Math. 
Stat.  42,  1887-1895. 

Huber,  P.J.  (1964).  "Robust  estimation  of  a  location  parameter,”  Annals  of 
Math.  Stat.  35,  73-101. 

Huber.  P.J.  (1981).  Robust  Statistics,  Wiley,  19B1. 


Jong.  J.  (1977).  "Robust  generalized  M-estimates  for  the  autoregressive  parame¬ 
ter,”  Ph.D.  dissertation.  University  of  Washington,  Seattle,  WA. 

Lee,  C.H.  and  Martin,  R.D.  (1982a).  "M-estimates  for  ARMA  processes,”  Tech. 
Rep.  No.  23,  Dept,  of  Statistics,  Univ.  of  Washington,  Seattle.  WA. 

Lee,  C.H.  and  Martin.  R.D.  (1982b).  "The  information  matrix  and  robust  M- 
estimates  for  ARMA  models,"  Tech.  Rep.  No.  24,  Dept,  of  Statistics,  Univ.  of  Wash¬ 
ington,  Seattle,  WA. 

Martin,  R.D.  (1981).  "Robust  methods  for  time  series,"  i n  Applied  Time  Series 
II,  edited  by  D.F.  Findley,  Academic  Press,  New  York. 

Martin,  R.D.  (1982).  “The  Cramer-Rao  lower  bound  and  robust  M-estimates  for 
autoregressions,"  Biometrika  69,  437-442. 

Martin,  R.D.  and  Yohai,  V.J.  (1984).  "Robustness  in  time  series  and  estimating 
ARMA  models,"  in  Handbook  of  Statistics,  Vol.  4,  Time  Series:  The  Time 


Domain,  ”  edited  by  Brillinger  and  Krishnaiah. 


Mosteller,  F.  and  Tukey,  J.W.  (1977).  Data  Analysis  and  Regression,  Addison 
Wesley,  Reading,  MA. 

Papantoni-Kazakos,  P.  and  Gray,  R.M.  (1979).  “Robustness  of  estimators  on  sta¬ 
tionary  observations,"  Ann.  Probability  7,  6:989-1002. 

Portnoy,  S.L  (1977).  "Robust  estimation  in  dependent  situations,”  Ann.  Statist. 
5,  522-529. 

Wegman,  E.J.  and  Carroll,  R.J.  (1977).  “A  Monte  Carlo  study  of  robust  estimators 
of  location,"  Comm,  in  S tat.  -Theory  and  Methods  A6,  795-812. 


Figure  la.  AREFF  versus  a  for  MA(1)  model  using  <i> 
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